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Abstract 



A central methodological problem in programming with multiple levels of abstrac- 
tions is the loosely defined problem of rep exposure. This paper traces the problem 
of rep exposure to the precisely defined notion of abstract aliasing. The paper also 
outlines a statically-enforceable discipline for avoiding abstract aliasing, but the 
outline is incomplete. 



0 Introduction 



The danger of rep exposure arises when a reference to a mutable component of an 
abstract data type is transferred into or out of the scope in which the representation 
of the data type is hidden. The danger is that operations on the mutable compo- 
nent could affect the value of the abstract data value (or vice-versa: operations on 
the abstract data value could affect the value of the mutable component). In the 
scope where the representation of the abstract data type is known, these interfer- 
ence effects are predictable and can be reasoned about using the method of data 
abstraction described by C.A.R. Hoare in 1972 [Hoa72]. But outside the scope, 
they are unpredictable and seem difficult to reason about. 

For example, if a stack s were implemented in terms of a sequence q[s] , then 
a series of push and pop operations on s would not behave as expected if there 
were interleaved updates to q[s] , and vice versa. 

The danger of rep exposure could be avoided by prohibiting the transfer of 
mutable components across abstraction boundaries. But this is too strict: it would 
prohibit many useful programs. We mention three examples: 

• The initialization method for an abstract type may well take mutable param- 
eters that become part of the initialized abstract value. 

• For efficiency reasons, it may be desirable to return a mutable component of 
an abstract data type to a client; perhaps with restrictions on the operations 
that the client can perform on the component. 

• If the abstract data type is a "container class" (such as a set, sequence, or 
table), the elements inserted and removed from the container may well be 
mutable. But the container would be truly useless if its methods were pro- 
hibited from storing an in-parameter into the container or from returning an 
element from the container as a out-parameter. 

An effective methodology for dealing with rep exposure would allow what is 
useful while preventing what is harmful. In this paper, we introduce a methodology 
that we think is a step toward this goal. Our methodology it not a full solution, but 
it applies to many example programs that we have studied, it is simple, and it can 
be enforced mechanically. 

Our approach builds on our previous work in reasoning about modular verifi- 
cation in terms of dependencies, as introduced by Leino in his Ph.D. thesis [Lei95] 
and extended by Leino and Nelson [LN98a]. We will try to make this paper ac- 
cessible to readers who don't know about dependencies, by defining the relevant 
terms as we need them. 
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1 Definitions 



A program is a collection of declarations. Declarations introduce names for en- 
tities (such as types, abstract and concrete data fields, methods) and/or specify 
properties of named entities (such as subtype relationships, representations of ab- 
stract fields, method specifications and method implementations). The declarations 
of a program are partitioned into units (sometimes called interfaces and modules). 
The declarations visible (that is, in scope) in a unit are its own declarations and the 
declarations visible in units that it imports. 

We consider a data field, abstract or concrete, to be a map from objects to 
values. Thus, where others write 

class T = {.../: int ... } 

we write 

typer 

\arf : T ->• int 

Also, we write f[o] where others write o.f . This semantics models an implemen- 
tation in which objects are references to data records containing field values, and 
in which two objects are equal when they reference the same data record. 

A data field can be declared to be abstract by preceding its declaration with 
spec . For example, 

spec var valid: T -> bool 

An abstract field occupies no memory at run-time; it is a fictitious field whose 
value (or representation) is later given in terms of other fields. This representation 
is declared by a syntax like 

rep valid[t: T] = f[t] ^ 0 (0) 

The variables appearing in the right-hand side of the rep construct for an abstract 
variable are called the dependencies of the abstract variable. The dependencies can 
themselves be either concrete or abstract. 

We require that dependencies be declared explicitly. For example, the repre- 
sentation (0) would cause a static error unless f[t] were declared as a dependency 
of valid[t] , which is done by a declaration of the form 

depends valid[t: T] on f[t] 

We allow only two forms of dependencies in this paper: static dependencies of 
the form 

depends a[t: T] on c[t] (1) 
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and dynamic dependencies of the form 

depends a[t: T] on c[b[t\] , (2) 

where b is a concrete field. If all the dependencies in a program are static, then the 
representation of each abstraction is confined to the fields of a single object, but if 
the program contains dynamic dependencies, then some abstraction's representa- 
tion includes fields of multiple objects connected by references (like the field b in 
(2)). We call the field b a pivot field. 

We impose the rule that the static dependency (1) be visible wherever c is, and 
the dynamic dependency (2) be visible wherever b is. These rules seem necessary 
for modular soundness, as explained in our companion paper [LN98a]. Because of 
the rule that a dynamic dependency must be visible anywhere its pivot field is, it 
follows that in any scope where a field is visible, it is known whether the field is a 
pivot field or not. 

Dependencies affect the verification process. For example, in a scope where 
depends a[t] on c[t] is visible, a procedure call that is known to change a[t] is 
assumed by the verifier to possibly change c[t] . But unless the actual rep clause 
for a[t] is also visible, nothing can be assumed about the nature of the change to 
c[t] . Thus dependencies are abstractions of rep clauses: they specify what is part 
of the representation of what, but they hide the explicit nature of the representation. 
This makes them valuable in dealing with rep exposure. 

The distinction between static and dynamic dependencies allows us to give a 
more precise account of rep exposure: only the mutable components correspond- 
ing to dynamic dependencies are dangerous (because with multiple objects, un- 
expected interference may occur in scopes where the pivot field and dependency 
are not visible); those associated with static dependencies are not (because with a 
single object, there is no pivot field to be out of scope). 

To prevent harmful rep exposure, we claim that it suffices to prevent abstract 
aliasing, which roughly means to prevent values of a pivot field from escaping 
the scope where the field is declared. The precise definition of abstract alias- 
ing is somewhat odd, since it involves both the static program text and dynamic 
possibility. We say that a[E] and c[F] are directly abstractly aliased at some 
dynamic execution point if F = b[E] A F / nil holds for some dependency 
depends a[t] on c[&M] that is not in scope at the corresponding program point 
(which implies that b is not in scope either at that program point). Notice that the 
condition F = b[E] makes sense even outside the scope of b , since b 's value ex- 
ists even at an execution point where b is not visible at the corresponding program 
point. Two expressions of the form a[E] and c[F] are abstractly aliased at some 
dynamic execution point if they are related by the transitive closure of the direct 
abstract aliased relation for that execution point, and all free variables of a[E] and 



3 



c[F] are in scope at the corresponding program point. In a program without infor- 
mation hiding, where all variables are in scope everywhere, abstract aliasing never 
occurs. 

2 Example 

As an example that will be useful here and later in the paper, we will consider a 
design of a lexer abstraction built on top of a reader abstraction. 

A reader models an input stream. Here is a part of the unit declaring the reader 
abstraction: 

unit R d 
type Rd.T 

spec var rvalid: Rd.T -> bool 

proc GetChar(rd: Rd.T): char 
requires rvalid[rd] 

proc C lose (rd: Rd.T) 
modifies rvalid[rd] 

This declares a type Rd.T and a boolean-valued abstract field rvalid that records 
whether objects of that type are valid. Typically, there will be many operations that 
require and preserve validity, of which we show one example, GetChar . We have 
also shown one procedure, Close , that destroys validity. The interface does not de- 
clare the fields that are relevant only to the implementation of readers. These fields 
would be declared in another unit, an implementation unit. The implementation 
unit would also provide the representation of the abstract field rvalid . 

A lexer is an abstraction that returns lexical tokens. The interface to lexers is 
very similar to that of readers: 

unit Lexer import Token 
type Lexer. T 

spec var Ivalid: Lexer. T -> bool 

proc GetToken(lx: Lexer. T): Token. T 
requires lvalid[lx] 

The implementation of lexers contains a variety of fields, of which we show 
one, rdr , which is a reference to the reader that supplies the stream to be converted 
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into tokens: 

unit Lexerlmpl import Lexer, Rd, Token 
var rdr. Lexer. T -> Rd.T 
depends lvalid[lx: Lexer. T] on rvalid[rdr[lx]] 
rep lvalid[lx: Lexer. T] = rvalid[rdr[lx]] A ... 

The idea is that the reader rdr[lx] supplies the character stream that is tokenized 
by the lexer Ix . We omit the other data fields of lexers, and the part of the repre- 
sentation of Ivalid that concerns these fields. 

In fact, the field rdr is a pivot field, since we need the conjunct rvalid[rdr[lx]] 
in the representation of lvalid[lx] , and therefore there is a dynamic dependency of 
Ivalid on rvalid . 

Envision a situation where the lexer interface provides a procedure, P , that 
returns the associated reader. A client could then use procedure P to retrieve the 
reader of a lexer, close the reader, and then operate on the lexer: 

given lvalid[lx], 
rd := P(lx) ; 
Close (rd) ; 

tok := GetToken(lx) 

This would go wrong at run-time (because the lexer is invalidated by the closing of 
the reader), but a modular verifier would miss the error. Note that in this program 
fragment, abstract aliasing occurs as defined in Section 1: after the line rd :— 
P(lx) , the condition rd = rdr[lx] A rd ^ nil holds, and (we assume) this is a 
scope where the rdr field is not visible. (If the rdr field were visible, then the 
dependency would be too, and a modular verifier would not miss the error.) 

Informally, we say that P causes simple upward leaking: it creates the possi- 
bility of abstract aliasing by "leaking" the value of a pivot field by directly returning 
it to a caller outside the scope of the field. 

3 Practical basis 

We have not yet implemented our methodology. We had planned to do so in the 
context of the Modula-3 Extended Static Checker (ESC) [DLNS98, LN98b, Det96, 
ESC], but this plan was not completed. In the meantime, the design of our method- 
ology was shaped by the following programs: 
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• The Modula-3 object-oriented buffered streams package (readers and writ- 
ers; about a thousand lines of Modula-3; originally written in Modula-2-i- by 
Kai Li and Butler Lampson) [BN91]. 

• The NI2 indexing library (about 20,000 lines of C code written by Mike 
Burrows; the heart of the AltaVista World-Wide Web Index). 

Both of these libraries use object-oriented programming techniques to implement 
abstract data types. Both of them hide the representation of their abstract types, 
and both of them occasionally transfer mutable components of the hidden repre- 
sentation into and out of the scope in which the representations are hidden, thus 
risking rep exposure. 

We have annotated the I/O streams library and checked it with ESC. Since the 
methodology of this paper was not implemented, we never mechanically verified 
the absence of abstract aliasing, but we did mechanically check the package for 
many other errors, including race conditions, deadlocks, array index errors, and 
nil -dereference errors. Thus we have identified all the pivot fields and are aware of 
all places in which mutable components of abstract types cross abstraction bound- 
aries. We believe that our methodology for avoiding abstract aliasing is flexible 
enough to handle this library. 

Mike Burrows has checked the NI2 indexing library with LCLint [Eva96], 
which warns about transfers of mutable components across abstraction boundaries. 
He found that these warnings were not indicative of real errors, and therefore used 
the "-repexpose" flag to LCLint, which globally suppresses all warnings of this 
type. We ran LCLint without this flag and examined the warnings that it reported. 
We believe this showed us the parts of the program relevant to abstract aliasing, 
while saving us the need to read all 20,000 lines. It is possible, but not certain, that 
our methodology for avoiding abstract aliasing is flexible enough to handle this 
library. 

The experience of Mike Burrows with LCLint and NI2 suggests that abstract 
aliasing may be more of a theoretical than a practical problem. It is clear that rep 
exposure makes simple program proof systems unsound. But it is not clear whether 
inadvertent rep exposure is a common source of errors. 

4 Methodology 

This section describes our methodology informally. 

To prevent simple upward leaking, we impose the somewhat drastic restriction 
that no return value (or, more generally, out-parameter) of any procedure is allowed 
to be the value of any pivot field. 
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Slightly more subtly, a procedure could leak the value of a pivot field by as- 
signing it to a global variable or to some field of some other object. We defend 
against this with another drastic restriction: at any instant, the values of pivot fields 
are not allowed to overlap with the values of global variables, nor with the values 
of non-pivot fields. We call this restriction apartheid. For brevity in the future, 
a non-pivot location means a global variable or non-pivot field. Thus, apartheid 
forbids any overlap between the values of pivot fields and the values of non-pivot 
locations. 

Let us say that a procedure captures an in-parameter if it assigns the param- 
eter to some global variable or field (pivot or not). In order to enforce apartheid, 
each procedure specification will have to disclose whether it captures any of its pa- 
rameters. For a formal parameter that is captured into a pivot field, the procedure 
specification must require the value of the parameter to be distinct from the values 
of all non-pivot locations. Otherwise, the checker would complain that the proce- 
dure body may violate apartheid. Similarly, for a formal parameter that is captured 
into a non-pivot location, the specification must require the value of the parame- 
ter to be distinct from the values of all pivot fields. Our methodology includes a 
captures notation to make it convenient to write these kinds of specifications. 

So far our methodology allows a computation to transfer a value between a 
pivot field and a non-pivot location, so long as no value is simultaneously shared 
between a pivot field and a non-pivot location. We have noticed that this freedom 
is unused in the example programs that we have encountered, and it appears to 
us that giving up this freedom simplifies our methodology. Therefore, we make 
another drastic rule, which we call monomorphism. The rule is that once an object 
becomes the value of a pivot field, it is no longer allowed ever to become the 
value of a non-pivot location, and vice versa. This rule is reminiscent of Leino 
and Stata's technique for keeping track of which objects have reference count zero 
without keeping track of the exact reference count of objects [LS97]. 

In light of the monomorphism rule, we make the following definitions. At a 
particular state in a computation, we say that a value is a pivot if it is or has been 
the value of some pivot field, that it is plenary if it is or has been the value of some 
non-pivot location, and that it is virgin otherwise. 

Our methodology also includes what we call the disjoint ranges requirement. 
It states that pivot fields declared in distinct scopes have disjoint ranges. That 
is, if b and d are pivot fields whose declarations occur in different scopes, then 
b[s] = d[t] A b[s] ^ nil is forbidden, for any s and t . The justification for 
this requirement is rather technical and is explained in our companion paper on 
dependencies [LN98a]. 
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5 Enforcing the methodology 



In this section, we describe our methodology more formally, and also describe 
how to enforce it mechanically. Given as input a program annotated with spec- 
ifications, we show how to transform it into another annotated program that will 
verify exactly when the input program would verify and the input program obeys 
our methodology. 

For our purposes, it is not necessary to describe the specification language in 
any detail. We assume the reader is familiar with pre- and postconditions and mod- 
ifies clauses, which we introduce with the Larch keywords requires , ensures , 
and modifies (see, for example, the CLU book [LG86]). 

The methodology is enforced in three steps: we introduce some special fields, 
we transform the input program, and we transform the input specifications. 

It is a consequence of our methodology that an object starts off being virgin and 
can transition into being either plenary or a pivot, but not both. To keep track of 
these transitions, we introduce three special boolean-valued fields: virgin , pivot , 
and plenary . These fields are special in that they are used only in describing 
the semantics of programs — the input program cannot read or write them directly. 
Furthermore, the fields pivot and plenary are not allowed to occur directly in 
specifications; they can be introduced into specifications only using the constructs 
described in this section. 

We transform the input program to an equivalent program that keeps the special 
fields up-to-date. The following table shows how the transformation is done. In 
the table, T denotes any object type, E any object-valued expression, b any pivot 
field, / any non-pivot field, g any global variable, and o any object. 



input program 



o := new(r) 

b[o] := E 
f[o] := E 
g:=E 



transformed program 



o :— new(r) ; virgin[o] :— true ; 

pivot[o] -.— false ; plenary[o] -.—false 
b[o] :— E ; virgin[b[o]] -.—false ; pivot[b[o]] :— true 
f[o] := E ; virgin\f[o]] -.—false ; plenary\f[o]] :— true 
g :— E ; virgin[g] -.— false ; plenary[g] :— true 



As a consequence of these transformations, the following predicates are guar- 
anteed to hold at all control points in the input program. In these formulas, o 
ranges over non- nil objects, and b , f , and g are used as in the table above. 

(Vo :: virgin[o] (pivot[o] V plenary[o]) ) 

(Vo :: pivot[b[o]] } A (Vo :: plenary\f[o]] } A plenary[g] 
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(Since b[o] and g may be nil , we introduce the boundary assumptions pivot[nil] 
and plenary[nil] .) 

The third step in enforcing our methodology is to transform the specifications. 
This involves desugaring each captures clause and inserting extra conditions in 
pre- and postconditions. 

There are two forms of captures clauses. The first form is 

captures o 

where o is an in-parameter. It desugars into 

requires o — nil v -^pivot[o] 
modifies virgin[o], plenary[o] 

Thus, if a procedure specification includes captures o , the procedure implemen- 
tation is assured that o is not a pivot and is allowed to capture o into a non-pivot 
location. 

The second form is 

captures o into b[t] 

where o is an in-parameter, b is a pivot field, and t is an expression. It desugars 
into 

requires o — nil v virgin[o] 
modifies virgin[o], pivot[o], b[t] 
ensures b'[t] = o 

where b' denotes the value of b in the post-state. Thus, if a procedure specifica- 
tion includes captures o into b[t] , the procedure implementation is assured that o 
is virgin and is constrained to capture o into the specific pivot field b[t] . 

The captures into annotation can be viewed as a more precise version of 
LCLint's annotation exposed. Thus, we write 

proc m(v, x) 

captures x into b[v] 

where in LCLint one would write 

proc m(v, /*@exposed@*/ x) 

Perhaps surprisingly, captures into is not a strengthening of captures : the 
latter is used when a parameter is captured into a non-pivot location, the former 
when a parameter is captured into a pivot field. The reason for the asymmetry is 
as follows. When a parameter is captured into a pivot field, the soundness of the 
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methodology requires explicit mention of the pivot field in the specification (as 
we shall see below). But when a parameter is captured into a non-pivot location, 
explicit mention of the non-pivot location is not necessary, and typically not useful. 

If there is no captures clause for some parameter, a caller can pass a pivot as 
the corresponding actual parameter: since the parameter is not captured, there is 
no danger that the procedure call would violate apartheid. 

All that remains is to strengthen the pre- and postconditions of the input pro- 
gram. 

We strengthen each procedure's postcondition by a conjunct 

r — nil v ->pivot'[r] (3) 

for each of its out-parameters r (including the return value). This enforces the rule 
against returning pivots. 

Finally, we add invariants as conjuncts to all pre- and postconditions. To en- 
force apartheid, we add 

(Vo :: ->(pivot[o] A plenary[o]) } 

To enforce the disjoint ranges requirement, we add 

{Vs,t :: b[s] = d[t] b[s] - nil) 

for each pair of pivot fields b and d that are visible and whose declarations are in 
distinct scopes. In these formulas, o , s , and t range over non- nil objects. 

Note that the apartheid invariant mentions only the fields pivot and plenary , 
both of which change monotonically in any execution. Therefore it is impossi- 
ble for a procedure to temporarily violate and then restore apartheid. This obser- 
vation permits the invariant to be checked incrementally, instead of at procedure 
boundaries: one can check ->pivot[x] at every update of plenary[x] and check 
->plenary[x] at every update of pivot[x] . 

The disjoint ranges requirement does not necessarily change monotonically, so 
it is enforced only on procedure boundaries. 

6 What we have achieved 

The contribution of our methodology is that it allows passing a pivot value across 
an abstraction boundary in cases where this is useful. In this section, we illustrate 
this contribution by continuing the lexer/reader example. 

Recall that each lexer Ix contains a reader rdr[lx] that supplies the characters 
that Ix tokenizes. We argue that this reader should be a parameter to the lexer 
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initialization method, so that, for example, a lexer from a file could be constructed 
as follows: 

rd :— File.Open(" /etc/passed") ; 
Ix :— Lexer. Init(new {Lexer. T), rd) 

The alternative would be to make the lexer implementation itself allocate the reader. 
But then it could support only a fixed number of reader subtypes. By making the 
reader a parameter to the lexer initialization method, any reader subtype can be 
used, even subtypes that were not envisioned at the time the lexer implementation 
was coded. This is one of the key advantages of object-oriented programming. 

Since the lexer initialization method captures its reader parameter, our method- 
ology forces this fact to be disclosed in its specification: 

proc Lexer .Init(lx: Lexer. T ; rd'.Rd.T): Lexer. T 
requires Ix ^ nil A virgin[lx] A rvalid[rd] 
captures rd into rdr[lx] 
modifies lvalid[lx] 
ensures Ivalid' [Ix] A result — be 

(The precondition virgin[lx] is not absolutely necessary, but it is convenient as will 
be explained in Section 8.) Lexer. Init returns the lexer that it initializes, following 
a common convention for initialization methods. 

Because Lexer. Init captures its rd parameter into a pivot field, its specifica- 
tion must use the second form of captures , in which the pivot field rdr is men- 
tioned explicitly. Consequently, the field must be visible in the interface, rather 
than hidden in the implementation. Therefore, we must move the declaration of 
the rdr field from the unit Lexerlmpl to the unit Lexer . Because of our rules for 
the placement of dependencies, the declaration 

depends lvalid[lx: Lexer. T] on rvalid[rdr[lx]] 

must also move from Lexerlmpl to Lexer . 

At first these changes seem shocking and bad, but they are necessary and harm- 
less. Necessary, because immediately after the call Lexer. Init(lx, rd) both Ix and 
rd are visible expressions and rd — rdr[lx] . Therefore, if the field rdr were not 
visible, abstract aliasing would occur by the definition in Section 1 . Harmless, be- 
cause the exposure of the rdr field in the interface does not entail any real loss 
of abstraction: the rep clause of Ivalid is still hidden in the implementation. In 
fact, the pivot field is effectively read-only to clients: since the dependency is vis- 
ible, changes to rdr[lx] are known to affect lvalid[lx] , but, since the rep is not 
visible, it is impossible to prove that a change to rdr[lx] maintains lvalid[lx] . A 
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more subtle example of this "read-only by specification" technique is described by 
Leino and Nelson [LN98a]. 

In summary, we can soundly allow pivots to cross abstraction boundaries by 
placing the dependency in the interface: The abstraction representation remains 
hidden in the implementation. If a perverse client tried to close the reader from 
under the lexer, the checker would detect that this action compromises the validity 
of the lexer, because the dependency is in scope. Indeed, we have structured the 
annotation language to railroad the programmer into this pattern: To verify the 
body of a procedure that captures a pivot, the programmer must specify it using the 
captures into notation, and thus must declare the pivot field in the interface, and 
therefore must declare the dependency there as well. 

7 What we have not achieved 

Our approach carefully controls the global variables and fields that can contain a 
pivot b[o] , but it doesn't say anything about how the owner of this pivot, o , might 
be reached. For example, our methodology does not prevent the code fragment (4) 
that initializes rd and Ix to be followed by a procedure call like 

P(lx, rd) 

where P is a procedure that is implemented where the rdr field is not visible. 
In this scenario, abstract aliasing would occur in the implementation of P . Even 
more alarmingly, abstract aliasing could occur if the first parameter to P were any 
object from which Ix is reachable, or if Ix were reachable from a global variable 
visible to the implementation of P . 

8 Discussion of variations 

To smooth the exposition, we have presented our methodology in strict and simple 
terms. In this section, we sketch some possible variations. 

First, the rule against returning pivots is more drastic than necessary. For ex- 
ample, it would be perfectly sound to forbid the return only of those pivots that 
were not among a procedure's in-parameters. The more liberal rule is sound be- 
cause simple upward leaking occurs only when a procedure provides a caller with 
access to a pivot that was not accessible before the call. In fact, we expect a liberal- 
ization along these lines to be very convenient; for example, to avoid clashing with 
the convention that initialization methods return the object initialized. The reader 
may have wondered why the conjunct virgin[lx] was present in the precondition 
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of Lexer. Init : without the precondition, no implementation that uses the initial- 
ization convention will be able to establish the postcondition (3), ->pivot[result] . 
By liberalizing the rule about returning pivots, this restrictive precondition could 
be omitted. 

Another way to liberalize the rule against returning pivots would be to have 
a dual to the captures into annotation, which would allow returning a pivot pro- 
vided that the specification states explicitly what is being returned. For example, if 
x is an out-parameter or a global variable, the annotation 

returns E as x 

could cause the conjunct -*pivot' [x] to be omitted from the postcondition, and in 
its place add the conjunct 

ensures x' — E 

This liberalization is sound for the same reason as the other liberalization is sound: 
returning a pivot does not cause abstract aliasing if the associated pivot field is in 
scope in the caller. We found several procedures in the NI2 indexing library for 
which returns as , or something like it, would be useful. 

Secondly, it would be perfectly sound to weaken the precondition in the desug- 
aring of captures o into b[t] from requires virgin[o] to requires -^plenary[o\ . 
However, we doubt that this would be convenient in practice, since, for one thing, 
the precondition of virginity is generally needed to prove that an initialization of a 
pivot field preserves the disjoint ranges requirement. 

Thirdly, one can imagine cases where a procedure captures a parameter into a 
pivot field, but where it is inconvenient to name the owner at the point of declaration 
of the procedure. This suggests a notation like 

captures o into b 

whose desugaring has the postcondition 
(3t :: b'[t] = 0} 

We have encountered one example program in which this notation might be useful. 

9 Related work 

The central problem addressed in this paper is to find an effective methodology for 
dealing with rep exposure that allows what is useful (passing a reader to an initial- 
izing method of a lexer) while preventing what is harmful (invisibly invalidating a 
lexer by closing the reader under it). 
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We learned the term rep exposure from the CLU community, but this com- 
munity seems not to have developed any formal methodology for avoiding the 
problem [LG86]. 

Keeping track of references is a crucial difficulty in many kinds of static pro- 
gram analyses. There are several papers introducing annotation techniques related 
to ours. The Larch C Lint checking tool (LCLint) warns of places where a C pro- 
gram transfers mutable components across abstraction boundaries [Eva96]. The 
annotations for turning off inappropriate warnings from LCLint are similar, but 
less precise, than those used in our methodology. In the course of developing a 
tool to assist in the structural change of programs, Chan, Boyland, and Scherlis 
have encountered problems similar to the rep exposure problem, and have devel- 
oped a number of annotations similar to ours [CBS98]. 

Our pivot and virgin attributes are similar to the unique and free modes of John 
Hogg's Islands paper [Hog91]. Almeida's Balloons paper also outlines a program- 
ming discipline for programming with object references [Alm97]. However, the 
Islands and Balloons papers are not directed toward the rep exposure problem and 
do not consider the connection between a reader and its enclosing lexer, and there- 
fore do not provide a solution to the problem we are addressing in this paper. 

The Flexible Alias Protection paper of Noble, Vitek, and Potter describes pro- 
gramming rules that prevent rep exposure [NVP98]. But the rules are too strict: 
they outlaw the lexer/reader program, because in this program a part of the repre- 
sentation of the lexer is referenced from outside the scope of the lexer's implemen- 
tation. 

Jones describes simple formal rules for avoiding interference, which is related 
to abstract aliasing, but his rules are too strict to be useful [Jon96]. 

The methodology described in this paper can be more precise because it is 
based on dependencies [LN98a]. Another of our contributions is that our paper is 
the first treatment of the rep exposure problem that contains no pictures. 

10 Conclusions 

It is difficult to design a programming discipline that prevents the unsound cases 
of rep exposure but allows useful and sound object-oriented programming styles. 
The theory of dependencies sheds new light on this problem. In particular: 

• Rep exposure is usually defined as the transfer of mutable components of 
an abstract data type across the abstraction boundary for that type. But in 
fact, only those components that correspond to dynamic dependencies are 
problematical; components that correspond to static dependencies can be 
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ignored. From this observation we have traced the source of unsoundness to 
the notion of abstract aliasing, precisely defined in terms of dependencies. 

• If some abstraction a[t] depends on the mutable value of some component 
b[t] (that is, if depends a[t] on c[b[t]]), a checker can detect the interfer- 
ence between updates to a[t] and even if the details of the repre- 
sentation of a are not visible. All that is required is for the dependency to 
be in scope. This observation allows pivots to cross abstraction boundaries 
provided that the associated pivot field and dependency are in scope in the 
relevant interface. 

Our analysis has led us to a conclusion that is shockingly different from the 
other approaches that we know. Instead of forbidding pivot values from ever cross- 
ing an abstraction boundary, we allow them to, provided that the pivot field is 
declared in the interface to the abstraction. Soundness is saved by declaring the 
dependency in the interface as well. 

We have developed these observations into a set of rules for avoiding abstract 
aliasing and have described how to enforce them mechanically. The rules prevent 
many cases of abstract aliasing, but not all cases. 
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